8,000 research outputs found

    A Recurrent Neural Network Survival Model: Predicting Web User Return Time

    Full text link
    The size of a website's active user base directly affects its value. Thus, it is important to monitor and influence a user's likelihood to return to a site. Essential to this is predicting when a user will return. Current state of the art approaches to solve this problem come in two flavors: (1) Recurrent Neural Network (RNN) based solutions and (2) survival analysis methods. We observe that both techniques are severely limited when applied to this problem. Survival models can only incorporate aggregate representations of users instead of automatically learning a representation directly from a raw time series of user actions. RNNs can automatically learn features, but can not be directly trained with examples of non-returning users who have no target value for their return time. We develop a novel RNN survival model that removes the limitations of the state of the art methods. We demonstrate that this model can successfully be applied to return time prediction on a large e-commerce dataset with a superior ability to discriminate between returning and non-returning users than either method applied in isolation.Comment: Accepted into ECML PKDD 2018; 8 figures and 1 tabl

    Space-time modeling of soil moisture: Stochastic rainfall forcing with heterogeneous vegetation

    Get PDF
    The present paper complements that of Isham et al. (2005), who introduced a space-time soil moisture model driven by stochastic space-time rainfall forcing with homogeneous vegetation and in the absence of topographical landscape effects. However, the spatial variability of vegetation may significantly modify the soil moisture dynamics with important implications for hydrological modeling. In the present paper, vegetation heterogeneity is incorporated through a two dimensional Poisson process representing the coexistence of two functionally different types of plants (e.g., trees and grasses). The space-time statistical structure of relative soil moisture is characterized through its covariance function which depends on soil, vegetation, and rainfall patterns. The statistical properties of the soil moisture process averaged in space and time are also investigated. These properties are especially important for any modeling that aggregates soil moisture characteristics over a range of spatial and temporal scales. It is found that particularly at small scales, vegetation heterogeneity has a significant impact on the averaged process as compared with the uniform vegetation case. Also, averaging in space considerably smoothes the soil moisture process, but in contrast, averaging in time up to 1 week leads to little change in the variance of the averaged process

    Stormwater in Silver Bow and Blacktail Creeks: Implications for the Microbial Community

    Get PDF
    Silver Bow and Blacktail Creeks are the headwaters of the Clark Fork River and are impacted by historic mining activities in the area. Although metal concentrations of runoff into the creeks are monitored and reported in previous studies, the composition and diversity of microbial communities are unknown. We seek to identify the microbial communities present and investigate changes in community structure due to stormwater impact, thereby determining and monitoring the overall environmental health of the system. We sampled five sites in Silver Bow and Blacktail Creeks in Butte, MT for chemical and biological analyses during high stormwater flow events. Water samples were collected for analysis of major anions and cations, metal concentrations, dissolved inorganic and organic carbon and carbon isotopes and hydrogen and oxygen isotopes in water. In situ measurements of pH, temperature and dissolved oxygen were taken at the time of sampling. Redox sensitive species - total dissolved sulfide, dissolved silica and ferrous iron - were measured using wet chemical tests and field spectrophotometry. Concurrent biological samples were collected for microbial identification and diversity (DNA), activity (protein), quantity (cell counts) and culturing. Overall microbial results are in progress, but water chemistry data provide clues about microbial habitats available in the creeks. Results upstream in Butte will be compared to downstream areas such as Durant Canyon and the Warm Springs Settling Ponds. The relationship between water chemistry, microbes, and overall ecosystem health can be characterized by deciphering how water chemistry affects microbial activity and vice versa

    Detecting bias arising from delayed recording of time

    Get PDF
    Sometimes in studies of the dependence of survival time on explanatory variables the natural time origin for defining entry into study cannot be observed and a delayed time origin is used instead. For example, diagnosis of disease may in some patients be made only at death. The effect of such delays is investigated both theoretically and in the context of the England and Wales National Cancer Register

    Combining frequency and time domain approaches to systems with multiple spike train input and output

    Get PDF
    A frequency domain approach and a time domain approach have been combined in an investigation of the behaviour of the primary and secondary endings of an isolated muscle spindle in response to the activity of two static fusimotor axons when the parent muscle is held at a fixed length and when it is subjected to random length changes. The frequency domain analysis has an associated error process which provides a measure of how well the input processes can be used to predict the output processes and is also used to specify how the interactions between the recorded processes contribute to this error. Without assuming stationarity of the input, the time domain approach uses a sequence of probability models of increasing complexity in which the number of input processes to the model is progressively increased. This feature of the time domain approach was used to identify a preferred direction of interaction between the processes underlying the generation of the activity of the primary and secondary endings. In the presence of fusimotor activity and dynamic length changes imposed on the muscle, it was shown that the activity of the primary and secondary endings carried different information about the effects of the inputs imposed on the muscle spindle. The results presented in this work emphasise that the analysis of the behaviour of complex systems benefits from a combination of frequency and time domain methods

    ElasticMatrix: A MATLAB toolbox for anisotropic elastic wave propagation in layered media

    Get PDF
    Simulating the propagation of elastic waves in multi-layered media has many applications. A common approach is to use matrix methods where the elastic wave-field within each material layer is represented by a sum of partial-waves along with boundary conditions imposed at each interface. While these methods are well-known, coding the required matrix formation, inversion, and analysis for general multi-layered systems is non-trivial and time-consuming. Here, a new open-source toolbox called ElasticMatrix is described which solves the problem of acoustic and elastic wave propagation in multi-layered media for isotropic and transverse-isotropic materials where the wave propagation occurs in a material plane of symmetry. The toolbox is implemented in MATLAB using an object oriented programming framework and is designed to be easy to use and extend. Methods are provided for calculating and plotting dispersion curves, displacement and stress fields, reflection and transmission coefficients, and slowness profiles

    Big data: Some statistical issues.

    Get PDF
    A broad review is given of the impact of big data on various aspects of investigation. There is some but not total emphasis on issues in epidemiological research

    Research data management and openness: the role of data sharing in developing institutional policies and practices

    Get PDF
    Purpose: To investigate the relationship between research data management (RDM) and data sharing in the formulation of RDM policies and development of practices in higher education institutions (HEIs). Design/methodology/approach: Two strands of work were undertaken sequentially: firstly, content analysis of 37 RDM policies from UK HEIs; secondly, two detailed case studies of institutions with different approaches to RDM based on semi-structured interviews with staff involved in the development of RDM policy and services. The data are interpreted using insights from Actor Network Theory. Findings: RDM policy formation and service development has created a complex set of networks within and beyond institutions involving different professional groups with widely varying priorities shaping activities. Data sharing is considered an important activity in the policies and services of HEIs studied, but its prominence can in most cases be attributed to the positions adopted by large research funders. Research limitations/implications: The case studies, as research based on qualitative data, cannot be assumed to be universally applicable but do illustrate a variety of issues and challenges experienced more generally, particularly in the UK. Practical implications: The research may help to inform development of policy and practice in RDM in HEIs and funder organisations. Originality/value: This paper makes an early contribution to the RDM literature on the specific topic of the relationship between RDM policy and services, and openness – a topic which to date has received limited attention

    A Quantile Variant of the EM Algorithm and Its Applications to Parameter Estimation with Interval Data

    Full text link
    The expectation-maximization (EM) algorithm is a powerful computational technique for finding the maximum likelihood estimates for parametric models when the data are not fully observed. The EM is best suited for situations where the expectation in each E-step and the maximization in each M-step are straightforward. A difficulty with the implementation of the EM algorithm is that each E-step requires the integration of the log-likelihood function in closed form. The explicit integration can be avoided by using what is known as the Monte Carlo EM (MCEM) algorithm. The MCEM uses a random sample to estimate the integral at each E-step. However, the problem with the MCEM is that it often converges to the integral quite slowly and the convergence behavior can also be unstable, which causes a computational burden. In this paper, we propose what we refer to as the quantile variant of the EM (QEM) algorithm. We prove that the proposed QEM method has an accuracy of O(1/K2)O(1/K^2) while the MCEM method has an accuracy of Op(1/K)O_p(1/\sqrt{K}). Thus, the proposed QEM method possesses faster and more stable convergence properties when compared with the MCEM algorithm. The improved performance is illustrated through the numerical studies. Several practical examples illustrating its use in interval-censored data problems are also provided
    corecore